Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes.
نویسندگان
چکیده
Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power-law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.
منابع مشابه
Zhang and Gerstein Patterns of Nucleotide Substitution, Insertion and Deletion in the Human Genome Inferred from Pseudogenes
Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1,726 processed RP pseudogene sequences, comprising more than 700,000 bases. To be sure to differentiate ...
متن کاملPolymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions
Intrinsically disordered protein regions are abundant in eukaryotic proteins and lack stable tertiary structures and enzymatic functions. Previous studies of disordered region evolution based on interspecific alignments have revealed an increased propensity for indels and rapid rates of amino acid substitution. How disordered regions are maintained at high abundance in the proteome and across t...
متن کاملHOPPSIGEN: a database of human and mouse processed pseudogenes
Processed pseudogenes result from reverse transcribed mRNAs. In general, because processed pseudogenes lack promoters, they are no longer functional from the moment they are inserted into the genome. Subsequently, they freely accumulate substitutions, insertions and deletions. Moreover, the ancestral structure of processed pseudogenes could be easily inferred using the sequence of their functio...
متن کاملPseudogene evolution and natural selection for a compact genome.
Pseudogenes are nonfunctional copies of protein-coding genes that are presumed to evolve without selective constraints on their coding function. They are of considerable utility in evolutionary genetics because, in the absence of selection, different types of mutations in pseudogenes should have equal probabilities of fixation. This theoretical inference justifies the estimation of patterns of ...
متن کاملAssociation of Prolactin and Prolactin Receptor Gene Polymorphisms with Economic Traits in Breeder Hens of Indigenous Chickens of Mazandaran Province
Polymorphisms in 5’-flanking region of prolactin (PRL), exon 2 and exon 5 of prolactin receptor (PRLR) genesand its association with growth and egg traits were examined in breeder hens of Mazandaran native fowlsbreeding station. A single nucleotide polymorphism at site C-2402T and a 24 bp nucleotide sequence insertionat situation -382 in 5’-flanking regions of PRL gene were id...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Nucleic acids research
دوره 31 18 شماره
صفحات -
تاریخ انتشار 2003